-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[helm-chart] add values for setting annotations and labels for rollout-operator #6733
base: main
Are you sure you want to change the base?
Conversation
@56quarters tagging you here as you have some context for these changes, as you're also tagged here and i think it makes sense for these things to be aligned. |
7f9e087
to
840edc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
@@ -53,11 +53,20 @@ spec: | |||
metadata: | |||
labels: | |||
{{- include "mimir.podLabels" $args | nindent 8 }} | |||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed for the alertmanager, I'm pretty sure they're safe to scale down at any time. Could one of the alerting folks confirm? @stevesg et al
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might also be the case for the store-gateway
right?
the resource that needs extra safety when scaling down is mainly the ingester if i'm not mistaken
@@ -52,8 +52,17 @@ spec: | |||
metadata: | |||
labels: | |||
{{- include "mimir.podLabels" $args | nindent 8 }} | |||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two annotations are mutually exclusive:
- A statefulset can either be "no-downscale" in which case, it's not allowed to be scaled down by anything.
- Or it can be scaled down by the rollout-operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense that the no-downscale
is the deciding factor here?
by that i mean that the condition for rendering the prepare-scaledown
is that it's value is true and that no-downscale
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitarvdimitrov has suggested that no downscale
be the default behavior which slightly simplifies this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I was interfering. I think having the prepare-scaledown and graceful scaledown procedure by the rollout operator are easier to reason about from an operator's POV - you just scale the StatfulSets as you would normally and the rollout-operator sorts out preparations and wait times.
But I suspect Nick's point was that setting both doesn't make sense. So it would be useful if we add some validations that they both cannot be set. Perhaps you can add a named template (helm function) to this file which checks the values of mimir.podLabels and mimir.podAnnotations and fails the chart rendering when both are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see, i didn't know about this possibility!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.
grafana.com/prepare-downscale-http-port: 80 | ||
{{- end-}} | ||
{{- if $rolloutZone.downscaleLeader }} | ||
grafana.com/rollout-downscale-leader: downscaleLeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this should be the name of the leader statefulset. For example, the leader of the zone B statefulset should be mimir-ingester-zone-a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake, i meant to add this as a variable.
annotations: | ||
{{- include "mimir.podAnnotations" $args | nindent 8 }} | ||
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if the default HTTP port for Mimir is changed in the helm chart. This might be 8080
these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}
. It's set in the same file. I think we should use the same template here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the contribution! Can you also add a changelog entry in operations/helm/charts/mimir-distributed/CHANGELOG.md
@@ -950,6 +974,14 @@ ingester: | |||
# If set to "-", then use `storageClassName: ""`, which disables dynamic provisioning | |||
# If undefined or set to null (the default), then fall back to the value of `ingester.persistentVolume.storageClass`. | |||
storageClass: null | |||
# -- noDownscale adds a label that can be used by the rollout-operator as documented in the rollout-operator repo: https://github.com/grafana/rollout-operator#webhooks | |||
noDownscale: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we enable this by default in all ingester zones? The idea behind this annotation was to prevent accidental downscaling
operations/helm/charts/mimir-distributed/templates/store-gateway/store-gateway-statefulset.yaml
Outdated
Show resolved
Hide resolved
annotations: | ||
{{- include "mimir.podAnnotations" $args | nindent 8 }} | ||
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: 80 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}
. It's set in the same file. I think we should use the same template here
840edc0
to
446ac64
Compare
…t-operator Signed-off-by: sam clulow <sam.clulow@springernature.com>
- default `no-downscale` to true - downscale labels are now conditionally rendered - fix path and port
Signed-off-by: sam clulow <sam.clulow@springernature.com>
a844db4
to
6ff0205
Compare
"noDownscale": $rolloutZone.noDownscale | ||
"downscaleLeader": $rolloutZone.downscaleLeader | ||
"prepareDownscale": $rolloutZone.prepareDownscale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the linter is complaining about the colons here
hey, there's something off with the commits in this PR. I see quite a lot of changes and number of commits. Can you take another look please @m1rp? |
Signed-off-by: sam clulow <sam.clulow@springernature.com>
4b4a6c4
to
2df4381
Compare
apologies, i managed to spectacularly mess up my git history/commit! should be sorted now. |
really sorry for the delay 🙈 I was away the week before and swiped away all GH notifications I can try to take a look later this week after GrafanaCon. @56quarters can you give this one more look? |
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped a few long-overdue comments. Apologies once again for the delay
# -- downscaleLeader is an Annotation used by the rollout-operator | ||
# If defined, downscaleLeader: <downscaleLeader> | ||
# If undefined or set to null (the default), no Annotation is set | ||
downscaleLeader: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great if we can set the downscale leader automatically in the zonal templates. That way automatica downscaling can be just a single boolean toggle for the ingester/store-gateway which automatically sets up downscale leader and prepare downscale for the 3 zones.
But i think this PR has been lingering for long enough to try to slow it down further. happy to not include this change in the PR if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having said that - i think having noDownscale: false
would be a safer change because it will mean folks can keep running the chart as they used to. If we get automatic downscaling automatically set up like I suggested above, then I think it's viable to have it on by default.
I think it makes sense to try to minimize the added manual effort for folks after they upgrade to a chart version which contains this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that changing noDownscale: false
makes sense. and as you mentioned the PR has been lingering for a while and i just don't have the bandwidth to deal with it.
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: store-gateway/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }} | ||
{{- end-}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this tag is invalid - it should be {{- end -}}
(extra space) It's also invalid for the other two templates
{{- if (eq $rolloutZone.noDownscale true )}} | ||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} | ||
{{- else }} | ||
grafana.com/prepare-downscale: {{$rolloutZone.prepareDownscale}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the annotation even if prepareDownscale is false? I think we can just not apply it
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }} | ||
{{- end-}} | ||
{{- if $rolloutZone.downscaleLeader }} | ||
grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also set the grafana.com/min-time-between-zones-downscale=12h
label on ingesters. store-gateways can work with about 30m (in case lazy loading is disabled). alertmanagers shouldn't need this config
metadata: | ||
labels: | ||
{{- include "mimir.podLabels" $args | nindent 8 }} | ||
{{- if (eq $rolloutZone.noDownscale true )}} | ||
grafana.com/no-downscale: {{$rolloutZone.noDownscale}} | ||
{{- else }} | ||
grafana.com/prepare-downscale: {{$rolloutZone.prepareDownscale}} | ||
{{- end }} | ||
annotations: | ||
{{- include "mimir.podAnnotations" $args | nindent 8 }} | ||
{{- if $rolloutZone.prepareDownscale }} | ||
grafana.com/prepare-downscale-http-path: store-gateway/prepare-shutdown | ||
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }} | ||
{{- end-}} | ||
{{- if $rolloutZone.downscaleLeader }} | ||
grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these annotations and labels should be set on the stateful set not on the pods themselves.
that's also what jsonnet does
mimir/operations/mimir/ingester-automated-downscale.libsonnet
Lines 34 to 37 in b9d91bd
statefulSet.mixin.metadata.withAnnotationsMixin({ | |
'grafana.com/prepare-downscale-http-path': 'ingester/prepare-shutdown', | |
'grafana.com/prepare-downscale-http-port': '80', | |
}), |
these are the rollouts operator docs https://github.com/grafana/rollout-operator?tab=readme-ov-file#how-scaling-up-and-down-works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this true of the labels as well? nevermind i looked at the jsonnet and answered my own question i think
i also merged with upstream and did some trivial changes in the changelog and some comments in the values.yaml. |
@m1rp Thank you for the contribution, can you please do the changes mentioned in the comments? Otherwise, we will close this PR. |
The CHANGELOG has just been cut to prepare for the next release. Please rebase |
What this PR does
add labels and annotations used by the rollout-operator.
Which issue(s) this PR fixes or relates to
Fixes #6650
Also very related to This PR over in the rollout-operator repo
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.Extra
i'd love some feedback on this, because not all possible annotation/labels have been added nor are they configurable so some input would be appreciated if this is something worth investing more time into.